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[57] ABSTRACT 

An integrated media image informarion storage and retrieval 
system processes information supplied by different types of 
media. A processor-based network server operates as a 
system interface between one or more user control terminals, 
a media image capture station through which media image 
input/output devices are coupled to the network server, and 
a memory for storing media image files to be retrieved for 
reproduction. A supervisory media image manipulation and 
control program is accessed through a supervisory graphical 
user interface at any user control terminal, and has embed- 
ded subordinate media image manipulation programs for 
different types of media and information formats. When 
using the interface to import information from an arbitrary 
medium, the user is able to generate a first, index storage file, 
and a supplemental text description-based file, so as to 
facilitate rapid retrieval of any type of data, regardless of its 
original format (e.g. text, picture, text-picture combination, 
video, audio) and regardless of the capture medium or 
source from which it is imported into the system, 

12 Claims, 3 Drawing Sheets 
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SYSTEM FOR STORAGE AND RETRIEVAL 
OF DIVERSE TYPES OF INFORMATION 
OBTAINED FROM DIFFERENT MEDIA 
SOURCES WHICH INCLUDES VIDEO, 
AUDIO, AND TEXT TRANSCRIPTIONS 

FIELD OF THE INVENTION 

The present invention relates in general to information 
storage and retrieval systems, and is particularly directed to 
a new and improved system for efficiently storing multiple 
types of media image information, including but not limited 
to text, still images, animation, graphics, video, and audio, 
derivable from a variety of media image sources, such as 
computer data base files, hard copy print media, 
photographs, audio cassettes, video camera, etc., and also 
rapidly accessing any piece of media image information for 
reproduction on an output device, such as a user display 
terrninal or printer. 

BACKGROUND OF THE INVENTION 

Continuing improvements in information gathering and 
processing technology have made it possible for industries 
and professions to avail themselves of a variety of media and 
associated storage and reproduction equipment. For 
example, in the legal profession, access to information 
stored on multiple and diverse types of media is crucial to 
research, the generation and storage of documents, and the 
gathering and evaluation of evidence. Also, success at trial 
often depends upon the ability of the presenting attorney to 
quickly locate and reproduce a critical piece of evidence, 
from what is typically a diverse collection of very large 
quantities of materia^ including, but not limited to, hard 
copy (paper documents, such as contracts, receipts, letters, 
manuscripts, etc.), photographs, audio and video storage 
media, and computer-accessible storage media. 

In order to facilitate mis information accessing task, one 
or more electronic information storage and retrieval devices, 
such as document scanners, opto- electronic image digitizers, 
large screen displays and the like, that allow substantially 
any piece of information, regardless of its original physical 
characteristics, to be stored and retrieved in an efficient and 
organized manner, have become commonplace pieces of 
courtroom equipment However, because the format of the 
information stored in one type of database for playback by 
an associated reproduction device is not necessarily com- 
patible with the format used by another data base and its 
associated playback device, accessing different pieces of 
information for presentation to a viewer currently requires 
the use of a number of separate, stand alone equipments, 
each of which has its own control software. 

For example, the format of a text database file, such as 
that of a contract or will, derived from an opto-electronic 
scanner, is not customarily compatible with the format of a 
still image or dynamic image database file, such as one 
derived from a digitized photograph, computer graphics 
image animation, or digitized imagery data frame from a 
video camera. As a consequence, the process of retrieving 
diversely formatted electronically stored information is a 
cumbersome and time-consuming one, requiring the use one 
or more separate software packages for each media and 
Information type in the course of operating the appropriate 
storage and retrieval device, to enable the information to be 
accessed and reproduced. 

This problem is exacerbated when the imagery informa- 
tion of interest has been captured on video tape, since 
locating a given scene or image clip on video tape often 
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entails a repetitive series of fast forward, look, and rewind 
operations of a video cassette recorder (VCR). Even when 
equipped with a mechanism for providing a time line index 
of respective frames scenes on video tape, there is still the 

5 need to wait while the VCR physically transports the tape 
from one clip location to another. 

One way to solve this problem is to transfer the imagery 
information stored on video tape to a faster access, mass 
storage medium, such as a laser platter or disc (CD-ROM). 

10 However, because a laser recording medium is a write once 
storage device, then whenever it is desired to modify or 
update any of the stored information, it is necessary to both 
"burn* a new laser disc or platter and also enter new 
parameter data employed in the access and playback control 

" software of the reproduction device, since the location where 
the information was stored on the previous medium has 
changed on the new medium as the result of the update. 

Thus, even though there exist various storage media and 
access devices for autonomously electronically storing and 

20 reproducing multiple forms of media image information, to 
date there has been no single or unitary system for integrat- 
ing diverse pieces of information sourcing, storage and 
playback equipment and in a manner that allows any piece 
of information, irrespective of its original format and 

25 medium in which it is supplied to the user, to be stored in a 
manner that allows it to be expeditiously located in a storage 
database, retrieved and played back on an image reproduc- 
tion device. 

30 SUMMARY OF THE INVENTION 

In accordance with the present invention, the above 
problem is successfully addressed by a new and improved 
information storage and retrieval system, which employs a 

35 graphical user interface through which data obtained from 
diverse types of media image generation equipments is 
converted to both a user-defined storage index format and a 
supplemental text description-based format, that facilitates 
rapid retrieval of any type of data, regardless of its original 
format (e.g. text picture, text-picture combination, video, 
audio) and regardless of the capture medium or source from 
which it is imported into the system. 

The overall system architecture of the storage and 
retrieval system of the present invention includes a 

45 processor-based network server, which operates as a system 
interface between one or more user control terminals, a 
variety of media image input/output devices and an atten- 
dant mass media image store. The fundamental media image 
input/output control mechanism of the system is performed 

SO through one or more local view stations, through which 
users of the system may control the operation of the network 
server, for storing and/or retrieving media images to be 
displayed, exported or printed. 
Each view station may comprise a processor-based video 

55 display terminal having a keyboard/mouse arrangement, and 
an associated display device, and interfaces with the network 
server via a local area network bus. The operational control 
program for each media image view station includes a 
supervisory media image manipulation and control program, 

£0 into which subordinate media image manipulation programs 
for use with various media and media image formats are 
embedded, so that a view station user may operate a variety 
of media image processing programs through a single, 
supervisory graphical user interface. Such subordinate 

65 embedded programs include a media image file annotation 
program that allows the user to mark-up/edit documents, an 
image indexing program for indexing a media image and 
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performing a free text search, and a multiple object file auxiliary base *text image' file, similar to that obtained by 

viewer for importing images having different formats from the high speed scanner. 

various media image sources. For general, non-text image media, such as photographs, 

In order to import and stare media images from different the original media image scene may or may not contain a 

media, the network server is coupled to a processor-based 5 text object that can be detected by processing the media 

view/scan/capture station, which is interfaced with a plural- image through a character recognition routine. To determine 

ity of information sources, such as a HjgjtfTing document whether an original media image contains any text, the base 

scanner, an audio input unit, a video camera and an auxiliary media image derived from the image-containing medium is 

input device. like the other media image processing subjected to a an optical character recognition text convcr- 

software, each of the media image capture programs 10 sion operator. This precursor text conversion operator cxam- 

employed by the view/scan/capture station is accessible ines the contents of the base media image file, which is a 

through the supervisory program of the user view stations. digitized pixel map representation of the original scene and, 

The audio input may be derived from a microphone using a character recognition routine, searches the digitized 

associated with video camera and is processed by a voice media image file for the presence of text anywhere m the 

recognitioD and text conversion correlation operator to pro- 15 media image, thereby creating a secondary, text file for that 

vide a search and identification mechanism for rapidly media ima S e - 

locating image clips in video data generated by the video Once a general media image has been imported, that 

camera. When voice from a video camera microphone is media image will have two associated media image files, 

imported, the voice/speech signals are processed through a The first is a base media image file obtained as a direct result 

voice recognition-to-text translation routine, so as to gener- 20 of the digitizing process carried out by the media image 

ate a Voice-text' file. import mechanism. The second is a text media image file, 

The network server is further coupled to a print/fax resulting from the precursor text detection operator carried, 

station, which is interfaced with a printer and an external and contains whatever text, if any, is found in the original 

communications link (telephone line) through which M media image. 

database- sourced media image files may be imported to and Advantageously, the manner in which imported media 

exported from the network server. Imported video data is images are processed far storage in accordance with the 

stored in an attendant mass data stare, such as a multi- present invention allows two alternative retrieval routines to 

gigabyte memory. The network server is further coupled to be used to access a stored media image for playback. The 

an object server, which controls access to all the other media ^ first retrieval routine, termed an index search, relies upon the 

type on the system. ability of the user to access folder and descriptor fields 

Irrespective of their type or original data format, all media within the folder where the media image has been filed. The 

inputs to the system are processed by means of a text- second, free text search is intended to be used when the user 

generator in order to create a 4 text* file for that media image. does not have sufficient a priori information to access the 

The generation of a text version of each media image 35 folder and descriptor fields within the folder where the 

enables the use of a free text search operator to locate any media image has been filed. Instead, it relies upon the 

media image file. The free text search mechanism program contents of the text image file associated with the media 

may comprise an indexing and text search program, through image to be retrieved. 

which a user at a view station may perform a free text search In accordance with the index search storage routine, using 

using Boulean and fuzzy logic parameters. By virtue of the ^ an archival document storage and retrieval graphical user 

fact that each stored media image file, regardless of its interface program, and the view station keyboard and mouse 

original format, is processed to generate an accompanying devices, the user either opens a new folder or opens an 

text file, then in the event the view station operator lacks a already existing folder in which one or more media images 

priori knowledge of an index search folder and its attendant may be filed. For each media image being stored within the 

description fields. He may still be able to retrieve the media 45 folder, the user enters various identification information, 

image using the free text operator, since the text file and base including media image topic and a media image file descrip- 

file are stored together in association with the index search tion. The media image index file also includes a description 

folder for the media image of interest field, termed a 4 key word' index field. The purpose of the key 

Since any text-containing document may include, in word index field is to provide a relatively concise descrip- 

addition. to standard typed or printed text other *text'-type 50 tion of me media image, that facilitates an index search by 

marks, such as a date stamp, signatures, hand notations etc., a user, and expedites retrieval of the media image by the free 

any scanned document scanned is processed through an text search mechanism, when a user lacks sufficient infor- 

optical character recognition operator program, so that the mation to open the folder in which the media image file is 

resulting text file for the media image of interest will include stored. 

not only standard typed or printed text but all text-type 55 Even though the key word index field can be prepared by 
Tmylnngs on the document being scanned. For general media the user directly from keyboard of the view station, it is 
image inputs, such as a photograph, the output of a video cumbersome, time consuming and relies upon the expertise 
camera, or voice input, precursor text-detection processing of the terminal operator to select the appropriate key words, 
of the original information signals is also performed to Pursuant to the present invention, the contents of the key 
produce a 'text* image file, and thereby allow use of a free 60 word index field are initially generated by a default, non- 
text search mechanism. essential word extraction subroutine, that derives the key 
For speech/voice signals, such as those provided by a words of the key word index field from the contents of the 
microphoQe associated with the video camera, in addition to text file. According to this key word field-generation 
generating a base media image file that corresponds to a subroutine, as each 'text* image file is generated, it is 
digitized representation of the original voice signals, the 63 subjected to a non-essential word extraction operation, 
voice signals are also processed through a precursor voice which reduces the contents of the media image's text file to 
recognition-to-tcxt translation routine, so as to produce an one or more 4 key words 1 , that are loaded into the key word 
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index field, but contain no auxiliary or connecting words, 
such as definite or indefinite articles, prepositions and other 
similar textual connectives, that may be parsed from the text 
without removing its essential content describing the media 
image. s 

The non-essential word extraction subroutine is prefer- 
ably a default operation for all text-containing media image 
files, such as text database files, digitized document files, or 
voice-image text files generated by the speech recognition- 
to-text conversion operator. However, it may be turned off 10 
for media images, such as photographs or video images, that 
typically contain only a limited, if any, quantity of alpha- 
numeric characters. Turning off the non-essential word 
extraction routine for such significantly reduced, or limited 
text content media images prevents it from excising any of 15 
the alpha-numeric characters in the text files obtained by the 
above-described character recognition preprocessing of such 
media images. In such a case, the entire contents of the 
character recognition-derived text image are inserted 
directly into the key word index field. 20 

According to a further feature of the present invention, 
where imported video is accompanied by voice/speech 
signals, Voice-text' images, obtained by the voice recogni- 
tion and text conversion routine, may be used in combina- 
tion with a transcribed text file of the deposition, so that the 25 
transcribed text file may be augmented with the time line 
data of the video, thereby enabling the transcribed text field 
to provide a text-based search and identification mechanism, 
that is capable of rapidly locating the exact portion of the 
video where a point in the testimony took place, 30 

For this purpose, the contents of the transcribed text file 
and the contents of the voice-text file are correlated with one 
another, and thereby associated the time line indices of the 
video tape time line with the transcribed text file. The 
transcribed text file is then augmented to include time line 3 
indices of the video tape time line, so that the transcribed 
text file, which is derived from an essentially one hundred 
percent complete record of the videotaped deposition, can be 
used to locate both the audio contents of the video tape, and 
the associated video. 40 

Pursuant to a further feature of the invention, in the course 
of storing a media image file, the contents of the media 
image file are 'hashed', so as to produce a hashing code 
representative of the digitized information contained in the ^ 
file. This hashing code is stored in a hidden field as part of 
the index file. Then, whenever a file is searched, duplicate 
copies of the media image may be rapidly located by 
invoking the hash code of any located file as a search 
parameter. M 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 diagrammaticaUy illustrates the overall architec- 
ture of the storage and retrieval system of an embodiment of 
the present invention; 55 

FIG. 2 is a process flow diagram illustrating the genera- 
tion of media image text files in the course of importing 
media images from a variety of information sources; 

FIG. 3 diagrammatically illustrates the manner in a media 
image that has been imported by an import and file genera- 60 
tion mechanisms is stored, so that it may accessed by means 
of an index search; 

FIG. 4 is a timing diagram of the relationship among 
video tape image frames, a time line containing a time index 
track associated with the (recorded video and audio) con- 65 
tents of the video tape, successive words of voice-text of the 
video tape as derived from a speech recognition-to-text 
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conversion operator, and successive words of text of the 
video tape, as derived from a court reporter transcription; 

FIG. 5 shows a correlation operation through which the 
contents of a transcribed text file and the contents of a 
voice-text file are correlated with one another to provide an 
association between time line indices of a video tape time 
line with a transcribed text file; and 

FIG. 6 is a flow diagram illustrating a hashing operation, 
through which the contents of the digitized media image 
contained in the index file are processed through a numerical 
compression algorithm (hashed) to produce a hashing code, 
representative of the media image. 

DETAILED DESCRIPTION 

Before describing in detail the new and improved (media 
image) information storage and retrieval system in accor- 
dance with the present invention, it should be observed that 
the invention resides primarily in what is effectively the 
interfacing of conventional data storage, retrieval, reproduc- 
tion and communication components, and the integration of 
their associated signal processing and media image manipu- 
lation control mechanisms, that are embedded in the opera- 
tional control software resident in the system's integrated 
computer network, which enables a user of the system to 
both efficiently store multiple types of information, deriv- 
able from diverse media, and to rapidly access any such 
stored media image for reproduction on an output device. 

Accordingly, the structure, control and arrangement of 
these conventional components and control mechanisms 
have been illustrated in the drawings by readily understand- 
able block diagrams and associated processing flow charts, 
which show only those specific details that are pertinent to 
the present invention, so as not to obscure the disclosure 
with structural details which will be readily apparent to 
those skilled in the art having the benefit of the description 
herein. Thus, the block diagram illustrations of the Figures 
do not necessarily represent the mechanical structural 
arrangement of the exemplary system, but are primarily 
intended to illustrate the major structural components of the 
system in a convenient functional grouping, whereby the 
present invention may be more readily understood. 

Moreover, since the details of the various commercially 
available programs, referenced below, are not necessary for 
an understanding of the present invention, they will not be 
described. Where specifics of any of the identified commer- 
cially available programs or hardware components 
employed in the system are desired, reference may be had to 
the suppUer/manufacturer of the item of interest 

Referring now to FIG. 1. the general architecture of the 
storage and retrieval system of the present invention is 
diagrammatically illustrated as comprising a multi-level 
network system, including a network server processor 10. 
that is coupled via a local area network (LAN) 11 to various 
input/output and control elements of the system, to be 
described. Network server processor 10 operates as a system 
interface between one or more (local or remote) user control 
terminals, a media image input/output device and an atten- 
dant media image store, as will be described. As a non- 
limiting example, network server processor 10 may com- 
prise an Intel processor chip-based computer (e.g. an $04867 
66 MHz chip, driven by a Novell 3.12 network server 
program), equipped with 64 Mb of local memory, an atten- 
dant 1 GB hard drive, a lObaseT hub, and associated 
cornmunication cables for the local area network. DOS 622 
and WINDOWS 3.11 control software may be employed as 
the operating system program for each of the processors of 
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the system. Database control software for the network server Digitizing document scanner 23 may comprise a high 

processor 10 may comprise a Gupta SQLBase program. speed digital scanner, such as a Fujitsu 3096EX or similar 

The principal media image input/output control mecha- high speed scanner device. The audio input 25 may be 

nism of the system is performed through one or more local derived from a microphone associated with video camera27 
view stations 14, which are coupled via a digital commu- J and, as will be described, is processed by a voice recognition 

nicaaon bus 12 of the LAN 11 to the network server "»<» text conversion correlation routine, to provide a search 

processor 10, so that users of the system may control the identification mechanism that locate image chps in 

operation of the network server 10 and store and/or access video data generated by video camera 27. As an initial 

dato(n«^airnag«)tor* display^ (In processing operation, whenever audio (voice from* video 
addition to being accessed and controlled by one or more » camera microphone) is input via audio source 25 the voice 

local view stations 14, the network server 12 may also be signals are processed through a voice recognition-to-text 

remotely accessed via a remote access interface 13.) translation routine, so as to generate a text image fil* 

, « „ , Video camera 27 itself may comprise any commercially 

Each view station 14 p f tferabty« S mprises a processor- ^ ^ „ a ^ output 

based video display terminal (VDT) having ; one or more conUining an on-board video tape cassette that may 

input/control devices, such as a keyboard/mouse ^ .^dtotVCR for playback, 
arrangement, and an associated display device (e.g. CRT). ... . , A . . .„ 

cogger with a network card thM^erfaces the view station Also coupled to network ^server pressor 10 is ajprmtffax 
wSlocal area network bus 12. The (storage and retrieval) Focessor station 31, which is operative to interface the 
operational control program for each media image view ^ «etwork «v« pnxessor II witha printer 33. and an 
station 14 comprises advisory media image manipula- 20 external convocations hiik (telephone line) 35 through 
don and control program, such as FYL by Identitech. into which media image files n»y be input to the networkserver 
which subordinate media image manipulation programs for processor 10, or from which media image files from a 
use with varies rr^ and n^irr^e formats are layered remote terminal may be coupled into the system (e.g. 
or embedded, so that the user has the ability to operate a „ transmitted by facsimile commiimcauon ^ electronic data 
variety of media image processing programs through a 25 transfer). As a non-limiting exampte, pnntffax station pro- 
single, supervisory controller. Such subordinate embedded 31 may cornpr^e an Intel processor ch^based 
n^anis include an image file annotation program (e.g., computer (such as an 80486/66 MHz cr^quipped with 16 
provided by Spicer Imagination), that allows the user to Mb of local memory, an attendant 1 GB hard drive, a 
mark-up/edit documents, an image indexing program (such lObaseT hub, a Modem/FAX card, and associated commu- 
as ZYIndex) that allows the user to index a media image and 30 nication cabies for me local area network, 
perform a free text search using Boolean and fuzzy logic To store video data, such as that derived from video 
parameters, a multiple object file viewer (such as FYIView camera 27, or a video tape downloaded from a VCR as 
by Identitech, or Review by Pegasus) for controlling the auxiliary input device 29, network server processor 10 is 
importing of media images having different formats from coupled to an attendant mass data store, shown as multi- 
various media image sources, and which may be supple- 35 gigabyte (eg a 27 GB RAID) memory 37. Network server 
mented by one or more auxiliary media image access processor 10 is further coupled to an object server processor 
programs (such as OUTSIDE IN). 41, which controls access to all of the other units of the 

For importing and storing images from different media, system- Lto print/fax station processor 31, object server 
network server processor 10 is coupled to a view/scan/ w processor 41 may ™*^™^J^ s ™J^**f. 
capture station processor 21. VTew/scan/capture station pro- computer (such as an 80486/66 MHz chm)^iiipped wth 16 
cesser 21 is interfaced with a plurality of information Mb of local memory, and an attendant 1 GB hard drive, 
sources, shown as including a di gitizing document scanner As described earlier, regardless of their type or original 
23, an audio input unit 25, a video camera 27 and an data format, all media inputs to the system (supplied pre- 
auxiliary input device 29 (such as a VCR, User disc unit. 4S dominantly through view/scan/capture station processor 21. 
digitizing still camera, and the like). but also through telecommunication interface 35). are pro- 

As a non-limiting example, view/scan/capture station cessed to create a 'text' file for that media image, so that a 
processor 21 may comprise an Intel processor chip-based &ee text search operator may be employed to ^ a media 
computer (e.g. a Pentium chip) together with associated ^ file, and thereby access the index file with which the 
Hauppage. MPEG, Monies Turbo and Daughter cards. Also x text file is associated. 

included in each view station terminal is a motion pictures As pointed out above, the free text search mechanism 
expert group (MPEG) decompression card (such as an program may comprise an off-the-shelf image indexing and 
OFTTVIew or Reelmagic card) and a network card for text search program (such as ZYIndex). through which a 
interfacing the view/scan/capture station 21 with the local user at a view station 14 may perform a free text search using 
area network 11. 55 Boulean and fuzzy logic parameters. By virtue of the fact 

For controlling the operation of view/scan/capture station that each stored media image file, regardless of its origmal 
processor 21. a Hauppauge WIN/TV or Pegasus Image format, is processed to generate an accompanying text file. 
Capture program maybe employed for media image then even with limited amounts of information about a 
capture. Word Scan Plus may be used for optical character media image, so that in the event the view station operator 
recognition, OFTTVIew or Reelmagic control software for 60 lacks a priori knowledge of toe index search folder and its 
the corresponding MPEG decompression card, while text attendant description fields, he may still be able to retrieve 
format processing for the document scanner may use Win- the media image using the free text operator. 
Word for Microsoft Windows. Like the other media image For certain types of inputs, such as a text database file 
processing software described previously, each of the media received from the print/fax station processor 31, or a digi- 
image capture programs employed by the view/scan/capture 65 tized document from the high speed scanner 23, for 
station processor 21 is accessible through the supervisory example, the digitized media image file is essentially a text 
FYI program of the user view stations. file, upon which the free text search mechanism may obvi- 
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ously operate. A typical example of imported text file is that processing of voice signals through a voice recognition and 

of a deposition transcript supplied from a court reporter, text translation routine to produce a 'text* image file, that 

either via a portable storage medium (e.g. floppy disc) or, enables the originally digitized and stored voice to be 

during trial, directly into the system from the court reporter's rapidly accessed and played back by the search and retrieve 

transcription terminal. 5 mechanism of the present invention, as will be described. 

More particularly, as shown at step 2#1 in the process For non-text, or * general* image media, such as 

flow diagram of FIG. 2, where the input media image file is photographs, the original media image scene may or may not 

imported from a digitally formatted source, such as an contain a text object (such as a name tag, label* sign or other 

ASCII file supplied from print/fax station 31, a floppy disc assembly of alpha numeric characters located somewhere in 

or CD-ROM supplied to a respective portable database to the media image), that can be detected by processing the 

interface of view/scan/capture station 21, or the output of image through a character recognition routine. In an attempt 

document scanner 23, it may be understood that at least a to determine whether an original general media image 

portion of the 'document' will contain text contains any text, the base image derived from the image- 

Of course, document scanner 23 could be used to scan a containing medium, such as a digitized media image 

medium other than a document at least a portion of which 13 obtained from a photograph that has been scanned into the 

contains text, such as a photograph of a group of people in system by a digitizing camera coupled to auxiliary input 29, 

which no text is visible. However, such is not the intended as shown at importing step 221 in FIG. 2, is subjected to a 

nor ordinary use of a (text-containing) document scanner an optical character recognition text conversion operator, as 

and, for purposes of the present description, it may be shown at step 223. 

presumed that the document scanner 23 is used in its 20 This precursor text conversion operator examines the 
customary manner to scan a text-containing document, contents of the base media image file, which is a digitized 
thereby converting the contents of the scanned document pixel map representation of the original scene and, using the 
into a digital text file, as shown at text format step 203, mat above-referenced character recognition routine, searches the 
is formatted in accordance with a prescribed word- digitized media image file for the presence of text (one or 
processing program, such as the WinWord program, 25 more alpha-numeric characters) anywhere in the media 
described above. image, thereby creating a secondary, 'text' file for that media 
Since any text-containing document may include, in image. For example, if the original medium is a photograph 
addition, to standard typed or printed text, other text'-type of an automobile having its license plate visible and at least 
marks, such as a date stamp, signatures, hand notations etc, partially readable in the scene, the text file for the photo- 
any document scanned by scanner 23 is processed through 30 graph would contain at least the alpha-numeric characters of 
an optical character recognition operator (OCR) program, the license plate (and any other text present in the photo- 
such as the Word Scan Plus software, referenced above, so graphed scene). 

that the resulting text file for the media image of interest wiU Once such a media image has been imported, the system 

include not only standard typed or printed text, but all will now contain two associated media image files. The first 

text-type markings on the document being scanned. or base media image file (e.g. from a photograph) is obtained 

For general image media inputs, such as a photograph, the as a direct result of the digitizing process carried out by the 

output of a video camera, or audio (voice) input, similar media image import mechanism in step 221; its associated 

precursor text-detection processing of the original informa- secondary or text image file, resulting from the precursor 

tion signals is necessary to produce a 'text' image file, that ^ text detection operator carried out in step 223, contains 

will allow use of the free text search mechanism, whatever text, if any, is contained in the original media 

In the case of audio (speech/voice) signals, such as those image. If the original media image contains no text of any 
provided by a microphone associated with video camera 27, kind, then the text image will also contain no text, and will 
in addition to generating a base media image file be identified as being *bianktext\ However, it is still a 'text' 
(corresponding to a digitized representation of the original 45 image file. If the character recognition operator has located 
audio), shown at step 211 in FIG. 2, the (mic-sensed) voice text anywhere in the original media image, the secondary 
signals are also processed through a precursor voice 'text' image file will contain some text (at least one alpha- 
recognition-to-text translation routine, shown at step 213, numeric character). 

thereby producing an auxiliary base 'text image* file, that is As described briefly above, the manner in which imported 

similar to that obtained by high speed scanner 23 when it 50 media images are processed for storage in accordance with 

scans a text-containing document. the present invention allows two alternative retrieval rou- 

For this purpose, any commercially available voice tines to be used to access a stored media image for playback, 

recognition-to-text translation routine which achieves rea- The first retrieval routine, termed an index search, relies 

sonable performance levels may be employed. Of course, upon the ability of the user to access the folder and descrip- 

the performance level may vary depending upon character- 55 tor fields within the folder where the media image has been 

is tics of the interpreted voice signals, as well as the voice filed, using the media image data base storage program, such 

detection program employed. Still, what is paramount is the as the FYI system, referenced above. As noted earlier, the 

fact that the resulting auxiliary voice-image file is in text second retrieval routine, termed a free text search, is 

format As a result, not only can the voice-converted text file intended to be used when the user does not have sufficient 

be searched individually by the free text search m«*anicm l ^ a priori information to access the folder and descriptor fields 

but, as will be described, the contents of the voice-converted within the folder where the media image has been filed, 

text file can be used to rapidly access video images derived Instead, it relies upon the contents of the text image file 

from a video camera from a microphone of which the associated with the media image to be retrieved, 

text-converted voice signals have been obtained. FIG. 3 diagrammatically illustrates the manner in which 

It should be noted mat the voice signals need not have 65 a media image that has been imported by any of the above 

been derived from a video camera, nor must they have described import and file generation mechanisms is stored, 

associated video to be processed and retrieved. It is the so that it may be accessed by means of an index search, for 
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which an accessing user of the system has some a prion 
knowledge. In accordance with the index search storage 
routine, using an archival document storage and retrieval 
graphical user interface program, such as the FYI system, 
referenced above, via the view station keyboard and mouse 5 
devices, the user either creates a new folder or opens an 
already existing folder 301 in which one or more media 
images may be filed. 

For each media image being stored within the folder 501, 
the user is interactively prompted to supply various identi- 10 
fication information, such as topic 303 to which the media 
image relates, and a media image file description 305, such 
as name, date, start of the file, end of the file fields, etc. Hie 
media image file also includes a description field 307, 
termed a 'key word* index field. The purpose of the key 13 
word index field 307 is to both provide concise description 
of the media image , that facilitates an index search by a user. 

Now although the contents of the key word index field can 
be prepared by the user, directly from keyboard of the view 
station 14, and may be necessary, particularly in the case of 
a photographic image containing little or no text, or a video 
image having no accompanying voice, requiring the user to 
create the key word index field for each imported media 
image is cumbersome, time consuming and relies upon the 
expertise of the terminal operator to select the appropriate 23 
key words. Pursuant to a time and labor saving feature of the 
present invention, the contents of the key word index field 
are initially generated by a default, non-essential word 
extraction subroutine, that derives the key words of the key ^ 
word index field from the contents of the text file. 

Pursuant to this key word field-generation subroutine of 
the present invention, as each *text* image file is generated, 
shown at 311 in FIG. 3, it is subjected to a non-essential 
word extraction operation, shown at 313, which reduces the 3 < 
contents of the media image's text file to one or more 'key 
words', that are loaded into the key word index field 307, but 
contain no auxiliary or connecting words, such as definite or 
indefinite articles, prepositions and other similar textual 
connectives, that may be parsed from the text without M 
removing its essential content describing the media image. 

This non-essential word extraction subroutine is prefer- 
ably a default operation for all text-containing media image 
files, such as text database files, digitized document files, or 
voice-image text files generated by the speech recognition- 45 
to-text conversion operator. However, it may be (selectively) 
turned off for media images, such as photographs or video 
images, that typically contain only a limited, if any, quantity 
of alpha-numeric characters. 'Railing off the non-essential 
word extraction routine for such significantly reduced, or 50 
limited text content media images prevents it from excising 
any of the alpha-numeric characters in the text files obtained 
by the above-described character recognition preprocessing 
of such media images. In this case, the entire contents of the 
character recognition-derived text image arc inserted 55 
directly into the key word index field 307. 

As noted previously, for limited text content media 
images, it can be expected that the view station operator will 
augment the contents of the key word index field, if any text 
is present Still, the amount of additional information to be 60 
inserted into the key word index field is an option of the 
operator. Obviously, a blank text file would produce and 
blank or empty key word index field, and require the 
operator to examine the media image (displayed on the view 
station monitor) and insert appropriate key words into the 65 
key word index field. On the other hand, if the contents of 
the key word index field's default entry are determined by 
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the operator to be sufficient, the entry may be simply 
accepted as a complete. Thus, where the original media 
image contains sufficient text to adequately provide a key 
word description, it is unnecessary for the operator to spend 
time generating or augmenting the key word index field, 
thereby reducing the time and manpower required to 
assemble the media image file's index folder. 

As described earlier, pursuant to an additional feature of 
the present invention, where imported video is accompanied 
by voice/speech signals (for example, a videotaped deposi- 
tion will contain both video showing the participants during 
the deposition, and accompanying voice/speech of the ques- 
tions and answers of the videotaped participants), the Voice- 
text* images, obtained by the voice recognition and text 
conversion routine, may be used in combination with a 
transcribed text file of the deposition, so as to enable the 
transcribed text file to be augmented with the time line data 
of the video, thereby enabling the transcribed text field to 
provide a text-based search and identification mechanism, 
that is capable of rapidly locating the exact portion of the 
video where a point in the conversation (e.g. testimony) of 
interest took place. 

Namely, the problem being addressed is how to quickly 
and precisely retrieve video tape images and accompanying 
testimony associated with those media images, where 
accompanying testimony (voice/speech) has been recorded 
(as by way of a court reporter transcript) separate from the 
video tape. 

One mechanism that has been proposed to solve the 
voice— video alignment problem, per se, is described in the 
Ardis et al, U.S. Pat No. 5,172,281, entitled: 'Video Tran- 
script Retriever* issued Dec 15, 1992. According to this 
patented scheme, an auxiliary time line generator is used to 
place timing indices on the court reporter's transcript tape, 
mat may be matched with those of the video tape. These time 
indices are then relied upon to assign a given portion of the 
transcript with a corresponding time index portion of the 
video tape. 

A fundamental problem with this approach is that it does 
not address the issue of the waiting time for the video tape 
to be transported from one location to another. Secondly, the 
system must operate in real time, so that the transcript time 
line will track that of the video tape. 

The search and identification mechanism of the present 
invention, which avoids the problem of mechanical tape 
transport and need not be carried out in real time, as in the 
Ardis et al patent, may be understood by reference to the 
temporal relationship diagram of FIG. 4. In the Figure, line 
401 represents a sequence of video image frames (Fl, F2, 
F3,F4, F5, . . . ) of the imported video (such as that provided 
from a video tape of a deposition). Line 403 shows a time 
line containing a time index track (Tl, T2, T3, T4, T5, . . . 
associated with the (recorded video and audio) contents of 
the video tape, line 405 represents successive words (Wvl, 
Wy2, Wv3, . . . , Wv9, . . . ) of the voice-text of the video 
taped conversation, as derived from a speech recognition- 
to-text conversion operator, as described above, and Line 407 
represents successive words (Wtl, Wt2, Wt3, . . . , 
Wtll, . . . ) of transcribed text derived from a court reporter 
transcription of the video taped deposition. 

It should be noted that line 405 has a voice-text gap Gl, 
that falls between word Wv4 and word Wv5, and a voice- 
text gap G2, that falls between word Wv8 and word Wv9. 
These voice-text gaps are due to the less than one hundred 
percent performance capability of current commercially 
available speech recognition-to-text conversion operators to 
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accommodate any individual* s speech characteristics — 
inflection, tone, dialect, shining, mumbling, etc. As speech 
analysis technology improves to the point where speech 
recognition-to-text conversion operator produces no gaps, 
and enjoys an accuracy compatible with that produced by a 
human transcriber, the transcription text will become 
superfluous, and any portion of the video can be located by 
a free text search of the converted voice-text file. However, 
because of this performance limitation of currently commer- 
cially available speech recognition-to-text conversion 
operators, the voice-text file cannot be relied upon to search 
either the voice contents of the deposition, of its video 
content. 

As shown in FIG. 5, to remedy this problem, the contents 
of the transcribed text file (successive words Wtl, WtZ 

Wt3 Wtl , ... of which are shown in line 467 of FIG. 

4), and the contents of the voice-text file (successive words 

Wvl, Wv2, Wv3 Wv9 of which are shown in line 405 

in FIG. 4), are coupled as respective inputs 501 and 5t3 to 
a correlator 506, which associates the time line indices (Tl, 
T2, T3, . . . , in line 403 of FIG. 4) of the video tape time 
line with the transcribed text file. In step 507, the transcribed 
text file is augmented to include time line indices of the 
video tape time line, so that the transcribed text file, which 
is derived from an essentially a one hundred percent com- 
plete record of the videotaped deposition, can be used to 
locate both the audio contents of the video tape, and the 
associated video. 

Id the above description, it has been assumed that both the 
video and audio signal were derived from the operation of a 
video camera, in which the video images and the voice 
signals are customarily stored on the same magnetic tape, so 
that they share a common time line on the tape. In the 
alternative, where video and voice are obtained through 
separate camera and microphone recording components, 
there are standard synchronization mechanisms for ensuring 
a common or mutual time line for the two sets of recordings. 

It will be appreciated that the above-described correlation 
of the contents of the transcribed text file with the contents 
of the voice-text file, which associates the time line indices 
of the video tape time line with the transcribed text file, so 
that the transcribed text file can be used to rapidly locate 
bom the audio contents of the video taped conversation and 
the associated video tram a mass storage database, offers a 
significant improvement over the conventional process of 
repetitively performing a time-<oasurning series of fast 
forward, look, and rewind operations of a video cassette 
recorder (VCR). It should also be noted that mis voice-text 
database correlation mechanism need not operate in real 
time with the operation of the video camera, but can be, and 
is customarily, operated off-line after the video taping ses- 
sion is finished. 

Pursuant to a further feature of the invention, diagram- 
matically shown in the flow diagram of FIG. 6, each 
respective media image stored in an index file, as shown at 
601, it is subjected to a 'hashing' operation, shown at 603. 
Namely, the contents of the digitized media image contained 
in the index file are processed through a numerical com- 
pression (hashing) algorithm, such as that customarily 
employed in data and signal processing systems, to produce 
a hashing code, shown at 605, representative of the media 
image. This hashing code is stored in a hidden field as part 
of the index file, as shown at 607. Then, whenever a file is 
searched, duplicate copies of the stored media image may be 
rapidly located by invoking the hash code of any located file 
as a search parameter. 

Thus, for example, the same document may be stored 
under a plurality of different index file identifiers. Once any 
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of these documents has been retrieved by either an index 
search or a free text search, whether any other copy of the 
accessed document exists may be quickly determined by use 
of the hash code embedded in the accessed file. Namely, the 
inclusion of the hash code speeds up the search query, since 
the search comparator needs only to look for a hash code 
match. 

As will be appreciated from the foregoing description, the 
present invention provides an integrated information storage 
and retrieval system, having a single graphical user interface 
through which data obtained from diverse types of media 
image generation equipments are converted to both a user- 
defined storage index format and a supplemental text 
description-based format, so as to facilitate rapid retrieval of 
any type of data, regardless of its original format (e.g. text, 
picture, text-picture combination, video, audio) and regard- 
less of the capture medium or source from which it is 
imported Into the system. 

While we have shown and described an embodiment in 
accordance with the present invention, it is to be understood 
that the same is not limited thereto but is susceptible to 
numerous changes and modifications as known to a person 
skilled in the art, and we therefore do not wish to be limited 
to the details shown and described herein but intend to cover 
all such changes and modifications as are obvious to one of 
ordinary skill in the art 

What is claimed: 

1. A method of processing information contained in 
different types of media, comprising the steps of: 

(a) processing each and every of said different types of 
media by storing in memory a first index storage file, 
which contains a text description of the contents of 
information contained in said any medium, and an 
identification of said first, index storage file; and 

(b) for each respective medium processed in step (a), 
analyzing the contents of information of said respective 
medium, regardless of the type of the subject matter of 
said contents of information, for the presence of text 
and generating a second, text file, that contains all text 
found in said medium and storing said second, text file 
in memory in association with said first, index storage 
file; and wherein 

said respective medium comprises a video recording 
medium containing video and voice information sig- 
nals associated with a video recorded activity, and a 
further medium containing a separate transcription of 
said voice information, and wherein 
step (b) comprises processing said voice information 
signals through an automated voice recognition-to-text 
conversion mechanism, so as to generate a voice- 
converted text file, and processing said further medium 
containing said separate transcription of said voice 
information to generate a transcribed voice text file, 
and correlating successive words of said voice- 
converted text file and successive words of said tran- 
scribed voice text file to associate contents of said 
transcribed voice text file with said video information. 
X A method according to claim 1, wherein said video 
information contains time line information, and wherein step 
(b) comprises correlating said successive words of said 
voice-converted text file with said successive words of said 
transcribed voice text file to associate said time line infor- 
mation of said video information with said transcribed voice 
text file, and storing said rime line information as part of said 
transcribed text file. 

3. A method according to claim 2, further including the 
step (c) retrieving video information associated with a 
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transcribed text file by searching said time line Information tion detected to be present in the analyzed information, 

of said transcribed text file. and stores said second, text file in said memory in 

4 A method of processing information supplied by way of association with said index storage file; and wherein 

a different types of media, comprising the steps of: said any medium comprises a video recording medium 

(a) for any medium of said different types of media, s containing video and voice Monnatton signals assoa- 
generating a first, index storage file, which contains ated with a video recorded actavity, and a farther 
mtamaSon contained In saidlny medium, an identi- medium containing a separate transcription of said 
fication of said first, index storage file and a text voice informauon. and wheran 

description of the contents of the information contained said information processor includes an automated voice 
in said any medium, and storing said first, index storage » recognition-to-text conversion mechanism which con- 
file in memory; and yats ^ voice ^formation signals to a voice- 

(b) for said any medium, analyzing said information co . nvencd te * ^J^J^T^l 
K J ^ . ** r"* .^^~7. *«u *u- tainlnjt said separate transcription of said voice 

contained therein, regardless of the type of thesubject ^mation to generate a transcribed voice text file, 

n^ofsaidinf^^on^forthe^senceof^t^ andcwUlates successive words of said voice- 

uifamahon ^i^^J^Z^^ inverted text file and successive words of said tran- 

conUins ^^^^Z^i^^l scribed voice text file so as to associate contents of said 

in f^^"^^^^^ 8 ^ ."Eg transcribed voice text file with said video information, 

text file in memory in association with said index g A to ^ 7 , said video 

storage file, an ,.20 information contains time line information, and wherein said 

wherein said any medium cornpnscs a video recording -^^^^ FOCCS sor is operative to correlate said succes- 

medium containing video and voice mfccmation sig- siyc warfs of M voicc _co n vcrted text file with said suc- 

nals associated with a video recorded activity, and a ^ wcrds of ^ transcribed voice text file to associate 

further medium containing a separate transcription of said ^ ^ information of said video information with 

said voice wformauon, and wherein ^ WQrds ^ ^ ^50^ voice text file, and to store said 

step (b) comprises processing said voice infonnation timc ^ information ^ part of said transcribed text file, 

signals through an automated voice recognition-to-text 9 A sy Stcm according to claim 8, further including an 

conversion mechanism, so as to generate a voice- image retrieval ywyhonijan which is operative to retrieve 

converted text file, and processing said further medium video information associated with a transcribed text file by 

containing said separate transcription of said voice ^ sca rcriing sa id time line information of said transcribed text 

information to generate a transcribed voice text file, ^ 

and correlating successive words of said voice- it. A system for processing information supplied by way 

converted text file and successive words of said trail- of a different types of media, comprising a processor-based 

scribed voice text file to associate contents of said network server, which operates as a system interface 

transcribed voice text file with said video information. 33 between one or more user control terminals, an image 

5. A method according to claim 4, wherein said video capture station through which image input/output devices 
infonnation contains time line information, and wherein step are coupled to said network server, and a memory for storing 
(b) comprises correlating said successive words of said image files to be retrieved for reproduction on a reproduc- 
voice-converted text file with said successive words of said tioD device, network server operating in accordance 
transcribed voice text file to associate said time line infor- ^ ^th a siipervisory image manipulation and control program, 
mation of said video information with words of said tran- accessible by means of a supervisory graphical user inter- 
scribed voice text file, and storing said time line information f acc ^ any user control terminal, said supervisory image 
as part of said transcribed text file. manip ulation and control program having embedded subor- 

6. A method according to claim 5, further including the < ^ naSc image manipulation programs for different types of 
step (c) retrieving video information associated with a 45 m edia ^ information formats thereof, said supervisory 
transcribed text file by searching said time line information im& ^ manipulation and control program being operative to 
of said transcribed text file. generate a first, index storage file, which contains informa- 

7. A system for processing information supplied by way Uon contained in any of said different type of media, an 
of a different types of media, comprising: identification of said first, index storage file, and a text 

an image capture station to which said different types of # description of the contents of the information contained in 

media are coupled and which is operative to generate a sa id any of said different type of media, and is operative to 

digitized representation of information contained in store said index storage file in said memory, and to analyze 

any medium coupled thereto; said information contained in said any of said different type 

memory in which digitized image files are storable; and of media, regardless of the type of the subject matter of said 

an information processor, which is coupled with said 55 information, for the presence of text-type information, and 

image capture station and is operative to generate a to generate a second, text file, mat contains all text-type 

first, index storage file, which contains mformation iiiforrriation detected to be present in the analyzed 

contained in said any medium, an identification of said information, and stores said second, text file in said memory 

first, index storage file, and a text description of the in association with said index storage file, and wherein said 

contents of the infonnation contained in said any 60 image capture station is coupled to import video and voice 

medium, and which is operative to store said index informatioo signals associated with a video recorded activity 

storage file in said memory, and wherein said mforma- from a video recording medium, and a separate transcription 

tion processor is further operative to analyze said of said voice information on a further medium, and wherein 
information contained in said any medium regardless of said image capture station includes an automated voice 
the type of the subject matter of said information, for 65 recognition-to-text conversion mechanism which converts 
the presence of text-type information, and to generate said voice infonnation signals to a voice-converted text file, 

a second, text file, that contains all text-type informa- and wherein said supervisory image manipulation and con- 
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trol program is operative to correlate successive words of 
said voice-converted text file and successive words of said 
transcribed voice text file so as to associate contents of said 
transcribed voice text file with said video information. 

11. A system according to claim 10, wherein said video 
information contains time line information, and wherein said 
supervisory image manipulation and control program is 
operative to correlate said successive words of said voice- 
converted text file with successive words of said transcribed 
voice text file to associate said time line information of said 
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video information with said transcribed voice text file, and 
to store said time line information as part of said transcribed 
text file. 

12. A system according to claim 11, wherein said super- 
s visory image manipulation and control program is operative 
to retrieve video information associated with a transcribed 
text file by searching said time line information of said 
transcribed text file. 

***** 
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